-
Notifications
You must be signed in to change notification settings - Fork 472
prov/efa: Add documentations #11792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
prov/efa: Add documentations #11792
Conversation
Signed-off-by: Shi Jin <[email protected]>
| | **Secondary Caps** |efa|efa-direct| | ||
| | ------------------ |:-:|:--------:| | ||
| | `FI_FENCE` |❌|❌ | | ||
| | `FI_MULTI_RECV` |❌ |✓ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FI_MULTI_RECV should be the other way around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, forgot to update.
This doc compares the features and implementations between efa and efa-direct fabrics Signed-off-by: Shi Jin <[email protected]>
|
bot:aws:retest |
|
|
||
| The **`efa` fabric** implements a comprehensive set of [wire protocols](efa_rdm_protocol_v4.md) that include emulations to support capabilities beyond what the EFA device natively provides. This allows broader libfabric feature support and application compatibility, but results in a more complex code path with additional protocol overhead. | ||
|
|
||
| The **`efa-direct` fabric** offers a more direct approach that mostly exposes only what the EFA NIC hardware natively supports. This results in a more compact and efficient code path with reduced protocol overhead, but requires applications to work within the constraints of the hardware capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "mostly"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expose counters which are not natively supported at the moment
| **Tx Post:** | ||
| - Constructs Work Queue Entry (WQE) directly from application calls (`fi_*` functions) | ||
| - Maintains 1-to-1 mapping between WQE and libfabric call | ||
| - Only performs two operations before data is sent over wire: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send to the NIC, not sent over wire
| **Tx Post:** | ||
| - Allocates internal data structure called `efa_rdm_ope` (EFA-RDM operational entry) | ||
| - Maintains 1-to-1 mapping between `efa_rdm_ope` and libfabric call (`fi_*` functions) | ||
| - Chooses appropriate protocol based on operation type and message size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And EFA NIC capabilities
| - Each `pke` corresponds to a WQE that interacts with EFA device | ||
| - One operation entry can map to multiple packet entries (e.g., a 16KB message can be sent via 2 packet entries) | ||
| - **Note**: For RMA operations (`fi_read`/`fi_write`), such workflow still applies, but when device RDMA is available, the data goes directly to/from user buffers without internal staging or copying. Since efa fabric supports unlimited | ||
| size for RMA, when the libfabric message is larger than the max rdma size of the device, it consume use multiple packet entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it consume use multiple packet entries", did you mean it consumes and uses?
|
|
||
| - Both support FI_EP_RDM for reliable datagram. | ||
|
|
||
| - FI_EP_DGRAM is only supported by efa fabric. Though it uses the same code path as efa-direct, it is kept in efa fabric for backward compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't explain why it is not supported in efa-direct. The reason is there is no large overhead in "efa" fabric with this, right?
|
|
||
|
|
||
|
|
||
| | **Modes** |efa|efa-direct| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the efa column empty?
| | **Other Libfabric Features** |efa|efa-direct| | ||
| | ---------------------------- |:-:|:--------:| | ||
| | FI_RM_ENABLED |\*|\* | | ||
| | fi_counter |✓ |✓ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support counters in efa-direct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| EFA->>Device: Poll device CQ | ||
| Device->>EFA: Device completion (pke) | ||
| EFA->>EFA: Find ope from pke | ||
| EFA->>Queue: Search Rx queue for match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only addressed RX CQEs here, maybe we should add TX ones as well (each in optional)
|
|
||
| **Rx Post:** | ||
| - Pre-posts internal Rx buffers to device for incoming data from peers | ||
| - User buffers from `fi_recv` calls are queued in internal libfabric queue (not posted to device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can be if we emulate the send/recv with RDMA read
| - Polls device CQ for completion of packet entries posted to EFA device | ||
| - Finds corresponding operation entries stored in packet entry structures | ||
| - Uses counters and metadata in operation entry to track completion progress | ||
| - Generates libfabric completion when operation entry has all required data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also copy and stage the CQEs - which is another difference between efa and efa-direct
|
|
||
| *** | ||
|
|
||
| | **Endpoint Types** |efa|efa-direct| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to add these here? Because it would be a duplicate of https://github.com/ofiwg/libfabric/wiki/Provider-Feature-Matrix and we would need to maintain it in both places
| participant Provider as EFA Provider | ||
| participant Application | ||
|
|
||
| Note over DeviceCQ,Application: Optimized Completion Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add lines here
Application->>Provider: Poll for completion
Provider->>DeviceCQ: Poll for completion
|
|
||
| The **`efa` fabric** implements a comprehensive set of [wire protocols](efa_rdm_protocol_v4.md) that include emulations to support capabilities beyond what the EFA device natively provides. This allows broader libfabric feature support and application compatibility, but results in a more complex code path with additional protocol overhead. | ||
|
|
||
| The **`efa-direct` fabric** offers a more direct approach that mostly exposes only what the EFA NIC hardware natively supports. This results in a more compact and efficient code path with reduced protocol overhead, but requires applications to work within the constraints of the hardware capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expose counters which are not natively supported at the moment
| | **Other Libfabric Features** |efa|efa-direct| | ||
| | ---------------------------- |:-:|:--------:| | ||
| | FI_RM_ENABLED |\*|\* | | ||
| | fi_counter |✓ |✓ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
No description provided.